Search results for "Instruction set"
showing 10 items of 16 documents
First Results of Hyperspectral Scene Generation in Preparation of the Chime Imaging Spectrometer Mission
2021
End-To-End mission performance simulators (E2Es) are software tools developed to support satellite mission preparatory activities. For passive remote sensing missions, E2Es generate synthetic scenes simulating the interaction of the solar radiation between the atmosphere and the surface; therefore allowing the estimation of the mission performance before its launch. In this paper, we present the CHIME Scene Generator Module (SGM) as part of CHIME E2Es, with state-of-the-art parallelization and optimization that give a performance allowing to obtain a whole year of daily worldwide Top-Of-Atmosphere radiance images in a matter of hours. The CHIME SGM generates 100x200km hyperspectral scenes w…
S-Aligner: Ultrascalable Read Mapping on Sunway Taihu Light
2017
The availability and amount of sequenced genomes have been rapidly growing in recent years because of the adoption of next-generation sequencing (NGS) technologies that enable high-throughput short-read generation at highly competitive cost. Since this trend is expected to continue in the foreseeable future, the design and implementation of efficient and scalable NGS bioinformatics algorithms are important to research and industrial applications. In this paper, we introduce S-Aligner–a highly scalable read mapper designed for the Sunway Taihu Light supercomputer and its fourth-generationShenWei many-core architecture (SW26010). S-Aligner employs a combination of optimization techniques to o…
Architectural improvements and FPGA implementation of a multimodel neuroprocessor
2003
Since neural networks (NNs) require an enormous amount of learning time, various kinds of dedicated parallel computers have been developed. In the paper a 2-D systolic array (SA) of dedicated processing elements (PEs) also called systolic cells (SCs) is presented as the heart of a multimodel neural-network accelerator. The instruction set of the SA allows the implementation of several neural algorithms, including error back propagation and a self organizing feature map algorithm. Several special architectural facilities are presented in the paper in order to improve the 2-D SA performance. A swapping mechanism of the weight matrix allows the implementation of NNs larger than 2-D SA. A systo…
LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs
2015
Compressed sparse row (CSR) is a frequently used format for sparse matrix storage. However, the state-of-the-art CSR-based sparse matrix-vector multiplication (SpMV) implementations on CUDA-enabled GPUs do not exhibit very high efficiency. This has motivated the development of some alternative storage formats for GPU computing. Unfortunately, these alternatives are incompatible with most CPU-centric programs and require dynamic conversion from CSR at runtime, thus incurring significant computational and storage overheads. We present LightSpMV, a novel CUDA-compatible SpMV algorithm using the standard CSR format, which achieves high speed by benefiting from the fine-grained dynamic distribut…
Bit-Parallel Approximate Pattern Matching on the Xeon Phi Coprocessor
2014
Bit-parallel pattern matching encodes calculated values in bit arrays. This approach gains its efficiency by performing multiple updates within a machine word. An important parameter is therefore the machine word size (e.g. 32 or 64 bits). With the increasing length of vector registers, the efficient mapping of bit-parallel pattern matching algorithms onto modern high performance computing architectures is becoming increasingly important. In this paper, we investigate an efficient implementation of the Wu-Manber approximate pattern matching algorithm on the Intel Xeon Phi coprocessor. This architecture features a 512-bit long vector processing unit (VPU) as well as a large number of process…
Design methods of multithreaded architectures for multicore microcontrollers
2011
The development of electronic technology today has allowed the implementation of complex architectures, which led to the emergence of multicore processors technology. Multicore architectures are built from superscalar and multithreaded processors. Integrating new technologies in embedded applications requires the development of multicore processors that can be integrated into a smaller area like a classic microcontroller. These processors must manage fewer resources and be able to manage multiple tasks simultaneously. In this paper we present a method of modeling, simulation and evaluation of two multithreaded architectures with limited resources, which could be integrated into embedded sys…
Spectral evolution simulation on leading multi-socket, multicore platforms
2011
Spectral evolution simulations based on the observed Very Long Baseline Interferometry (VLBI) radio-maps are of paramount importance to understand the nature of extragalactic objects in astrophysics. This work analyzes the performance and scaling of a spectral evolution algorithm on three leading multi-socket, multi-core architectures. We evaluate three parallel models with different levels of data-sharing: a sharing approach, a privatizing approach and a hybrid approach. Our experiments show that the data-privatizing model is reasonably efficient on medium scale multi-socket, multi-core systems (up to 48 cores) while regardless algorithmic and scheduling optimizations, sharing approach is …
SWAPHI-LS: Smith-Waterman Algorithm on Xeon Phi coprocessors for Long DNA Sequences
2014
As an optimal method for sequence alignment, the Smith-Waterman (SW) algorithm is widely used. Unfortunately, this algorithm is computationally demanding, especially for long sequences. This has motivated the investigation of its acceleration on a variety of high-performance computing platforms. However, most work in the literature is only suitable for short sequences. In this paper, we present SWAPHI-LS, the first parallel SW algorithm exploiting emerging Xeon Phi coprocessors to accelerate the alignment of long DNA sequences. In SWAPHI-LS, we have investigated three parallelization approaches (naive, tiled, and distributed) in order to deeply explore the inherent parallelism within Xeon P…
Multiple modular very long instruction word processors based on field programmable gate arrays
2007
Modern field programmable gate array (FPGA) chips, with their large memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high-density FPGAs, it is now possible to implement a high-performance very long instruction word (VLIW) processor core in an FPGA. This paper describes research results about enabling the DSP TMS320 C6201 model for real-time image processing applications by exploiting FPGA technology. We present a modular DSP C6201 VHDL model with a variable instruction set. We call this new development a minimum mandatory modules (M3) approach. Our goals are to keep the flexibility of DSP in order to shor…
Flexible VLIW processor based on FPGA for real-time image processing
2011
Modern FPGA chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high density FPGAs it is now possible to implement a high performance Very Long Instruction Word (VLIW) processor core in an FPGA. With VLIW architecture, the processor effectiveness depends on the ability of compilers to provide sufficient Instruction Level Parallelism (ILP) from program code. This paper describes research result about enabling the VLIW processor model for real-time processing applications by exploiting FPGA technology. Our goals are to keep the flexibility of processors in order to shorten the developm…